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Abstract 

We derive the optimal e-differentially private mechanism for single real-valued query function under a very general utility- 
maximization (or cost-minimization) framework. The class of noise probability distributions in the optimal mechanism has 
staircase-shaped probability density functions which are symmetric (around the origin), monotonically decreasing and geometri- 
cally decaying. The staircase mechanism can be viewed as a geometric mixture of uniform probability distributions, providing a 
simple algorithmic description for the mechanism. Furthermore, the staircase mechanism naturally generalizes to discrete query 
output settings as well as more abstract settings. We explicitly derive the optimal noise probability distributions with minimum 
expectation of noise amplitude and power. Comparing the optimal performances with those of the Laplacian mechanism, we show 
that in the high privacy regime (e is small), Laplacian mechanism is asymptotically optimal as e — > 0; in the low privacy regime 
(e is large), the minimum expectation of noise amplitude and minimum noise power are 0(Ae~2 ) and 0(A 2 e - T) as e — > +00, 
while the expectation of noise amplitude and power using the Laplacian mechanism are ^ and where A is the sensitivity 
of the query function. We conclude that the gains are more pronounced in the low privacy regime. 

I. Introduction 

A. Background and Motivation 

Differential privacy is a rigorous framework to quantify to what extent individual privacy in a statistical database is preserved 
while releasing useful statistical information about the database [1 1. The basic idea of differential privacy is that the presence 
of any individual data in the database should not affect the final released statistical information significantly, and thus it can 
give strong privacy guarantees against an adversary with arbitrary auxiliary information. For more background and motivation 
of differential privacy, we refer the readers to the excellent survey Q. 

The standard approach to preserving e-differential privacy for a real-valued query function, i.e., the released statistical 
information is real-valued, is the Laplacian mechanism, which adds noise with Laplace distribution to the output of query 
function. While differential privacy guarantees can be achieved by Laplacian mechanism, which is the state of art, it is not clear 
whether there exist other noise probability distributions which can outperform the Laplace distribution in the same problem 
setting. 

B. Our Result 

We answer the above question affirmatively in this work. Our main result is to show the optimality of a class of noise 
probability distributions for single real-valued query function under a very general utility-maximization (or cost-minimization) 
framework. The class of optimal noise probability distributions has staircase-shaped probability density functions which are 
symmetric (around the origin), monotonically decreasing and geometrically decaying for x > 0. The staircase mechanism 
can be viewed as a geometric mixture of uniform probability distributions, providing a simple algorithmic description for the 
mechanism. Furthermore, the staircase mechanism naturally generalizes to discrete query output settings as well as more abstract 
settings. We show that adding query-output independent noise with staircase distribution is optimal among all randomized 
mechanisms (subject to a mild technical condition) that preserve differential privacy. 

We explicitly derive the optimal noise probability distributions with minimum expectation of noise amplitude and power. 
Comparing the optimal performances with those of the Laplacian mechanism, we show that in the high privacy regime (e 
is small), Laplacian mechanism is asymptotically optimal as e — > 0; in the low privacy regime (e is large), the minimum 
expectation of noise amplitude and minimum noise power are 0(Ae~i) and 8(A 2 e~^) as e — > +00, while the expectation 
of noise amplitude and power using the Laplacian mechanism are — and ^4-, where A is the sensitivity of the query function. 
We conclude that the gains are more pronounced in the low privacy regime. 

C. Connection to Existing Works 

1 ) Laplacian Mechanism vs Staircase Mechanism: The Laplacian mechanism is specified by two parameters, e and the query 
function sensitivity A. e and A completely characterize the differential privacy constraint. On the other hand, the staircase 
mechanism is specified by three parameters, e, A, and 7* which is determined by e and the utility function/cost function. For 
certain classes of utility functions/cost functions, there are closed-form expressions for the optimal 7*. We plot the probability 



(a) Laplace Mechanism (b) Staircase Mechanism 

Fig. 1: Probability Density Functions of Laplacian Mechanism and Staircase Mechanism 



density functions of Laplace mechanism and staircase mechanism in Figure [T] Figure [5]in Section III gives a precise description 
of staircase mechanism. 



From the two examples given in Section IV we can see that although the Laplacian mechanism is not strictly optimal, in 
the high privacy regime (e — » 0), Laplacian mechanism is asymptotically optimal: 

• For the expectation of noise amplitude, the additive gap from the optimal values goes to as e — > 0, 

• For noise power, the additive gap from the optimal values is upper bounded by a constant as e —> 0. 

However, in the low privacy regime (e — > +oo), the multiplicative gap from the optimal values can be arbitrarily large. We 
conclude that in the high privacy regime, the Laplacian mechanism is nearly optimal, while in the low privacy regime significant 
improvement can be achieved by using the staircase mechanism. We plot the multiplicative gain of staircase mechanism over 
Laplacian mechanism for expectation of noise amplitude and noise power in Figure |2j where Voptimai is the optimal (minimum) 
cost, which is achieved by staircase mechanism, and Vt ap is the cost of Laplacian mechanism. We can see that even for modest 
e s» 10, the staircase mechanism has about 15-fold and 23-fold improvement, with noise amplitude and power respectively. 

Since the staircase mechanism is derived under the same problem setting as Laplacian mechanism, the staircase mecha- 
nism can be applied wherever Laplacian mechanism is used, and it performs strictly better than Laplacian mechanism (and 
significantly better in low privacy scenarios). 





(a) < e < 10 (b) 10 < e < 20 

Fig. 2: Multiplicative Gain of the Staircase Mechanism over the Laplacian Mechanism. 



2) Worst-case Result: We emphasize that the staircase mechanism is a worst-case optimal result. We impose no further 
assumptions on the properties of query function q beyond its sensitivity. If we know more properties that q satisfies, it is 
entirely possible that staircase mechanism is not the best. For example, if we know that the range of q is Z, then we do 
not need to add noise which are not integers, in which case discrete probability distributions are the better than the staircase 
mechanism. 

3) Related Works: Several works study the optimal noise distributions satisfying the differential privacy constraint under a 
utility-maximization framework; e.g., 0, 0, 0, 0, Q, 0, 0, (TO), flj], G2- In particular, shows that for a single 
count query with sensitivity A = 1, for a general class of utility functions, to minimize the expected cost under a Bayesian 
framework the optimal mechanism to preserve differential privacy is the geometric mechanism, which adds noise with geometric 
distribution. |6| shows that for general query functions, no universally optimal differential privacy mechanisms exist. [5| derives 
the optimal noise probability distributions for a single count query with sensitivity A = 1 for minimax (risk-averse) users. [5] 
shows that although there is no universally optimal solution to the minimax optimization problem in for a general class of 
cost functions, each solution (corresponding to different cost functions) can be derived from the same geometric mechanism. 

The differences between our work and 0, are: 0, study a single count query function, which is integer-valued 
and has fixed sensitivity A = 1, while we study general real-valued query function, and generalize the result to integer-valued 
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query function with arbitrary sensitivity A. In the case A = 1 in the discrete setting, we show that the staircase mechanism 
will be reduced to the geometric mechanism, as in J4), Q. 

D. Organization 

The paper is organized as follows. We formulate the utility-maximization/cost-minimization under the differential privacy 
constraint as a functional optimization problem in Section [II] We present the solution (including an algorithmic representation) 
and our main result Theorem [T] in Section III and the detailed proof is given in Appendix [A] In Section IV we apply our 



main result to derive the optimal noise probability distribution with minimum expectation of noise amplitude and power, 
respectively, and compare the performances with the Laplacian mechanism. Section [V] presents the asymptotic properties of 
7* in the staircase mechanism for momentum cost functions, and suggests a heuristic choice of 7 that appears to work well 



for a wide class of cost functions. Section VI proves that adding query-output-independent noise with staircase distribution is 



optimal among all randomized differentially private mechanisms under a technical condition. Section VII extends the staircase 



mechanism to the abstract setting, and Section |VIII| generalizes the staircase mechanism for integer- valued query function in 
the discrete setting. 



II. Problem Formulation 



Consider a real-valued query function 



V n -> R, (1) 



where T> n is the domain of the databases. 

The sensitivity of the query function q is defined as 



A^ max \q(D 1 ) - q(D 2 )\, (2) 

D 1 ,D 2 CT>":\D 1 -D 2 \<1 

where the maximum is taken over all possible pairs of neighboring database entries D\ and D 2 which differ in at most one 
element, i.e., one is a proper subset of the other and the larger database contains just one additional element (2). 

Definition 1 (e-differential privacy 121). A randomized mechanism K. gives e-differential privacy if for all data sets D\ and 
D 2 differing on at most one element, and all S C Range(JC), 

Pr[K.{D x ) eS}< exp(e) Pr[JC(D 2 ) € S]. (3) 

The standard approach to preserving the differential privacy is to add noise to the output of query function. Let q{D) be 
the value of the query function evaluated at D C T> n , the noise-adding mechanism K, will output 

K{D) = q{D)+X, (4) 

where X is the noise added by the mechanism to the output of query function. 

In the following we derive the differential privacy constraint on the probability distribution of X from (j3j. 

Pr[AC(£>i) e S] < exp(e) Pr[/C(L> 2 ) € S] (5) 

Pr[q(Dx) +X eS}< cxp(e) Pi[q{D 2 ) + X e S] (6) 

Pr[X eS- q{Di)] < exp(e) Pr[X eS- q(D 2 )} (7) 

«Pr[IeS'] <exp(e)Pr[XeS' + g( J Di)-g( J D 2 )] ) (8) 

where S' = S - q(Di) = {s - q(Di)\s e S}. 

Since Q holds for all measurable sets SCM, and \q{D{) — q(D 2 )\ < A, from <[8j we have 

Prpf € S'} < cxp(e) Pi[X € S' + d], (9) 

for all measurable sets S' C R and for all |d| < A. 

Consider a cost function £(•) which is a function of the added noise X. Our goal is to minimize the expectation of the cost 
subject to the e-differential privacy constraint |9|. 

More precisely, let V denote the probability distribution of X and use V{S) denote the probability Px\X £ S]. The 
optimization problem we study in this paper is 

minimize / C(x)V(dx) (10) 

V Jx£R 

subject to V(S) < e e V(S + d),V measurable set SCI, V|d| < A. (11) 

Our main contribution is to solve the above functional optimization problem and derive the optimal noise probability 
distribution for a general class of cost functions £(•). 
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Fig. 3: The Staircase-Shaped Probability Density Function f 7 (x) 



III. Main Result 

In this section we state our main result Theorem [T] The detailed proof is provided in Appendix [A] 
We assume that the cost function £(•) satisfies two (natural) properties. 

Property 1. C{x) is a symmetric function, and monotonically increasing for x > 0, i.e, C(x) satisfies 

C{x) = C(-x),\/x e E, (12) 

and 

£(x)<£(y),VQ<x<y. (13) 

In addition, we assume C(x) satisfies a mild technical condition which essentially says that £(•) cannot increase too fast 
(while still allowing it to be unbounded). 

Property 2. There exists a positive integer T such that C(T) > and C(x) satisfies 

sup \. - < +oo. (14) 

X >T C(X) 

Consider a class of probability distributions with symmetric and staircase-shaped probability density function defined as 
follows. Given 7 g [0, 1], define V 1 as the probability distribution with probability density function / 7 ( ) defined as 

'0(7) x e [0,7A) 

m = C e : (Ct) XG[7A ' A) (15) 

' e- fc 7 7 (>- fcA) x £ [kA, (k + 1)A) for k e N 

fy( — X) X < 

a(7) -2A( 7 + e-(l-7))- (16) 

It is straightforward to verify that / 7 (-) is a valid probability density function and V~ satisfies the differential privacy 
constraint ( [TT| . Indeed, the probability density function f 1 (x) satisfies 

f-r(x) < e e f-y(x + d),Vx e R, |d| < A, (17) 

which implies ( fTTj ). 

Let ST be the set of all probability distributions which satisfy the differential privacy constraint ( [TTj ). Our main result is 
Theorem 1. If the cost function C(x) satisfies Property^and Property^ then 



where 



inf / C(x)V(dx) = inf / C(x)Ux)dx. (18) 



Proof: See Appendix [A] 
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Algorithm 1 Generation of Random Variable with Staircase Distribution 
Input: e, A, and 76 [0,1]. 

Output: X, a random variable (r.v.) with staircase distribution specified by e, A and 7. 
Generate a r.v. S with Pr[5 = 1] = Pr[5 = -1] = \. 

Generate a geometric r.v. G with Pr[G = i] = (1 — b)b l for integer i > 0, where b = e~ ( 
Generate a r.v. U uniformly distributed in [0,1]. 

Generate a binary r.v. B with Pr[B = 0] = 7+(1 7 _ 7)b and Pr[B = 1] = M_ . 
X <- S((l- B) ((G + 7 C/)A) + B ((G + 7 + (1 - *f)U)A)). 
Output X. 



Therefore, the optimal noise probability distribution to preserve e-differential privacy for any real-valued query function has a 
staircase-shaped probability density function, which is specified by three parameters e, A and 7* = argmin J ^ L C(x)f 1 (x)dx. 

76[0,1] x 

A natural and simple algorithm to generate random noise with staircase distribution is given in Algorithm nl 
In the formula, 

Xi-S((l-B) ((G + 7 U)A) + B ((G + 7 + (1 - l)U)A)) , (19) 

• S determines the sign of the noise, 

• G determines which interval [GA, (G + 1)A) the noise lies in, 

• B determines which subinterval of [GA, (G + 7 )A) and [(G + 7) A, (G + 1)A) the noise lies in, 

• U helps to uniformly sample the subinterval. 

IV. Application 

In this section, we apply our main result Theorem [T] to derive the optimal noise probability distributions with minimum 
expectation of noise amplitude and with minimum power, respectively, and then compare the performances with Laplacian 
mechanism. 

A. Optimal Noise Probability Distribution with Minimum Expectation of Noise Amplitude 

To minimize the expectation of amplitude, we have cost function C(x) = \x\, and it is easy to see it satisfies Property [T] 
and Property [2] 

To simplify notation, define b = e~ e , and define 

V(V)= [ C(x)V(dx). (20) 

for given probability distribution V '. 

Theorem 2. To minimize the expectation of the amplitude of noise, the optimal noise probability distribution is "P 7 » with 

1* = T~~ T 1 (2D 

1 + e2 

and the minimum expectation of noise amplitude is 

Vy>r) = A-^—. (22) 
e e — 1 

Proof: See Appendix [5] ■ 
Next, we compare the performances of the optimal noise probability distribution and the Laplacian mechanism. The Laplace 
distribution has probability density function 

f(x) = ^e-^, (23) 

where A = — . So the expectation of the amplitude of noise with Laplace distribution is 

,+00 A 



/-t-00 
|a:|/(a;)dx 
-OO 



e 



(24) 



By comparing V(Vry*) and Vj, p> it is easy to see that in the high privacy regime (e is small) Laplacian mechanism is 
asymptotically optimal, and the additive gap from optimal value goes to as e — > 0; in the low privacy regime (e is large), 
V Lap = f), while V(V r ) = e(Ae-f). Indeed, 
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Fig. 4: Optimal 7* for cost function L(x) = x 2 



Corollary 3. Consider cost function C(x) — \x\. In the high privacy regime (e is small), 

V Lap -V(V r )=A(±-^+0(e 5 ) 



24 5760 



as e — > 0. 

And in the low privacy regime (e is large), 



V, 



Lap 



V(V r ) = G(Ae-i), 



as e — > +00. 



(25) 



(26) 
(27) 



B. Optimal Noise Probability Distribution with Minimum Power 

Given the probability distribution V of the noise, the power of noise is defined as J xeR x 2 V(dx). Accordingly, the cost 
function C(x) = x 2 , and it is easy to see it satisfies Property [T| and Property [2] 



Theorem 4. To minimize the power of noise (accordingly, L(x) = x 2 ), the optimal noise probability distribution is V^* with 



1 = 



1 - b 



and the minimum power of noise is 



V{V T ) = A ; 



(fe-26 2 + 26 4 -6 5 ) 1 /3 
2 1 /3(i_6)2 ' 



2 -2/3 6 2/3 (1 + 6) 2/3 



(28) 



(29) 



(l-b) 2 

Proof: See Appendix [C] ■ 
Next, we compare the performances of the optimal noise probability distribution and the Laplacian mechanism. The power 



of noise with Laplace distribution with A 



A 



Lap 



+00 -, a 2 

2 1 _M , n ^ 

x — e A ax — i^r- 
.on 2A e 2 



(30) 



By comparing V(J > 1 * ) and Vi, av , it is easy to see that in the high privacy regime (e is small) Laplacian mechanism is 
asymptotically optimal, and the additive gap from optimal value is upper bounded by a constant as e — > 0; in the low privacy 
regime (e is large), V Lap = e(^), while V(V r ) = Q(A 2 e -^). Indeed, 

Corollary 5. Consider cost function C(x) — x 2 . In the high privacy regime (e is small), 

V Lap - V(T r ) = A 2 (1 - ^ + 0(e^ , (31) 

as e — > 0. 
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And in the low privacy regime (e is large), 

2A 2 

V(V T ) = 0(A 2 e- 2 ^), (33) 

as e —} +00. 



V Lap =^ 7 (32) 



V. Property of 7* 

In this section, we derive some asymptotic properties of the optimal 7* for momentum cost functions, and give a heuristic 
choice of 7 which only depends on e. 

A. Asymptotic Properties of 7* 

we have seen that for the cost functions C(x) — \x\ and C(x) = x 2 , the optimal 7* lies in the interval [0, |] 



In Section 



IV 



for all e and is a monotonically decreasing function of e; and furthermore, 7* — > I as e goes to 0, and 7* — > as e goes to 
+00. 

We generalize these asymptotic properties of 7 as a function of e to all momentum cost functions. More precisely, given 
in £ N and m > 1, 

Theorem 6. Consider the cost function C(x) = \x\ m . Let 7* be the optimal 7 in the staircase mechanism for C(x), i.e., 



We have 



7* = axgmin / \x\ m f 7 (x)dx. (34) 

7S[0,1] Jx£R 

l*^^,ase^0, (35) 

7* ->• 0, (isf4 +00. (36) 

Proof: See Appendix [D] ■ 

Corollary 7. For all the cost functions £(•) which can be written as a positive linear combination of momentum functions, 
the optimal 7* in the staircase mechanism has the following asymptotic properties: 

7* -> ^ as e °> ( 37 > 
7* -> 0, a* e -> +00. (38) 

B. A Heuristic Choice of"/ 

We have shown that in general the optimal 7* in the staircase mechanism depends on both e and the cost function £(•). 
Here we give a heuristic choice of 7 which depends only on e, and show that the performance is reasonably good in the low 
privacy regime. 

Consider a particular choice of 7, which is 

b e~ e 

^ : =2 = V (39) 
It is easy to see that 7 has the same asymptotic properties as the optimal 7* for momentum cost functions, i.e., 

7^0, as b -> 0, (40) 

Furthermore, the probability that the noise magnitude is less than ^-pA is approximately | in the low privacy regime 
(e — > +00). Indeed, 

Pr[|X| < ^A] = Pr[|Jf| < 7 A] = 2a( 7 )7A = -^"^ = (42) 

which goes to | as e — > +00 (accordingly, 6 — > 0). 
On the other hand, for Laplace mechanism, 

e" e /•^ iA 1 

Pr[|A| < 

which goes to zero as e 



PrpT|<— A]=/ — e-^dx = l-e- — , (43) 
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VI. Optimality of Staircase Mechanism Among All Randomized Mechanisms 

Our main result Theorem [T] shows that among all noise-adding mechanisms in which the additive noise is query-output 
independent, the staircase mechanism is optimal. In this section, we claim that among all randomized mechanisms, adding 
query-output-independent noise is optimal, and prove this claim under a technical condition. 

A. Problem Formulation 

Following the notation used in (3), a generic randomized mechanism AC is a family of probability measures indexed by the 
query output (denoted by t), i.e., 

K = {nt -.tern.}, (44) 

and the mechanism will output a random variable with probability distribution \i t when the query output is t G K. 

The differential privacy constraint is that for any ti,<2 eK such that \t\ — t%\ < A, and for any measurable set ScK, 

H H (S)<e e (x t2 (S). (45) 

Given a cost function £(■) satisfying Property [T] and Property [2] the optimization problem that we study is to 

minimize sup / C(x — t) fit(dx) (46) 

teR JxeR 



over all families of probability measures, subject to the differential privacy constraint ( |45[ ). 
Given /i t , define a probability measure v t via 

v t (S)±n t (S + t), (47) 

for all measurable sets Set. One can view v t as the probability measure of the noise added to the query output t. Then, 
equivalently, the differential privacy constraint R5\ is 



vtA 8 ) < e^t^S + h - t 2 ),V measurable set Scl, (48) 
and the optimization problem is to minimize 

sup / C(x)u t (dx). (49) 

Our claim is that in the optimal family of probability measures, v t is independent of t, i.e., the probability measure of noise 
is independent of the query output. If this claim is true, then the staircase mechanism is the optimal mechanism among all 
randomized mechanisms to achieve e-differential privacy. 

We will prove this claim under a technical condition which assumes that {vt}te$L is piecewise constant and periodic (the 
period can be arbitrary) in terms of t. More precisely, for any positive integer n, and for any positive real number T, define 

T T 

K, T , n = { {vt]tm I {vt]tm satisfies (HSki'i = v k T,fov t e [A;—, (k + 1) — ), k G Z, and v t+T = i/ t ,Vi G M}. (50) 

n n 

Theorem 8. Among the collection of families of probability measures in 

Ut>0 U„>i K.T,n> (51) 

the optimal family of probability measures {t'tjtgR to minimize sup 46R J i£R C(x)v t {dx) is that Vt G K, v t is the probability 
measure with probability density function / 7 » (•), where 



7*=argmin / C(x)f 1 {x)dx. (52) 

76[0,1] Jx£R 

Proof: See Appendix |E] ■ 
More generally, we conjecture that the technical condition can be done away with and optimality of the staircase mechanism 
is true among all randomized mechanisms satisfying the differential privacy constraint. 



9 



VII. Extension to The Abstract Setting 

In this section, we show how to extend the staircase mechanism to the abstract setting. The approach is essentially the same 
as the exponential mechanism in [13|, except that we replace the exponential function by the staircase function. 

Consider a privacy mechanism which maps an input from a domain T> n to some output in a range TZ. Let ji be the base 
measure of TZ. In addition, we have a cost function C : 1> n x TZ — > [0, +00). 

Define A as 

A= max \C(D 1 ,r)-C(D 2 ,r)\, (53) 

rell, D 1 ,D 2 CT> n :\D 1 -D 2 \<l 

i.e., the maximum difference of cost function for any two inputs which differ only on one single value over all r € TZ |fl~3). 

A randomized mechanism K, achieves e-differential privacy if for any D 1 ,D 2 C T> n such that \Di — D 2 \ < 1, and for any 
measurable subset S C TZ, 

Pr[/C(L>i) e S]< exp(e) Pr[/C(L> 2 ) e S}. (54) 

Definition 2 (staircase mechanism in the abstract setting). For fixed 7 G [0, 1], given input D g T> n , the staircase mechanism 
in the abstract setting will output an element in TZ with the probability distribution defined as 

f s f y (C(D,r))n(dr) 

Vd(S) = -~ , t—— — ,V measurable set S C TZ, (55) 

Jreizf~f( C ( D ' r )W( dr ) 

where f 1 is the staircase-shaped function defined in ( j!5| >. 

Theorem 9. The staircase mechanism in the abstract setting in Definition [2] achieves 2e- differential privacy. 

Proof: For any Di,D 2 G T> n such that \Di — D 2 \ < 1, and for any measurable set S C TZ, 

f reS MC(D u r)Mdr) 
Vd ^ ( S > = 1 r tnr> vT77~\ (56) 

e f r e S MC(D 2 ,r))»(dr) 
- 6 / reW / 7 (C(Di,r))/i(dr) 

e J reS MC(D 2 ,r)Mdr) 
~ f r eizfi(C(D 2 ,r))p(dr) P8) 
= e 2t V D2 (S), (59) 

where we have used the property that / 7 (C(£>i,r)) < e £ f 7 (C(D 2 ,r)) and f 7 (C(D 2 ,r)) < e c f 1 {C{Di, r)) for all reK. 
Therefore, the staircase mechanism in the abstract setting achieves 2e-differential privacy for any 7 6 [0, 1]. 

■ 

In the case that the output range TZ is the set of real numbers M and the cost function C(d,r) = \r — q(d)\ for some 
real-valued query function q, the above mechanism is reduced to the staircase mechanism in the continuous setting. 

VIII. Extension to The Discrete Setting 

In this section, we extend our main result Theorem [T] to the discrete settings, and show that the optimal noise-adding 
mechanism in the discrete setting is a discrete variant of the staircase mechanism in the continuous setting. 

A. Problem Formulation 

We first give the problem formulation in the discrete setting. 
Consider an integer- valued query function Q 

q : V n -> Z, (60) 

where T) n is the domain of the databases. Let A denote the sensitivity of the query function q as defined in Clearly, A 
is an integer in this discrete setting. 

Let q(D) be the value of the query function evaluated at D C T> n . The noise-adding mechanism JC will output 

JC(D) =q(D)+X, (61) 

where X is the noise added by the mechanism to the output of query function. To make the output of the mechanism be valid, 
i.e., q(D) + X G Z, X can only take integer values. 

'Without loss of generality, we assume that in the discrete setting the query output is integer- valued. Indeed, any uniformly-spaced discrete setting can be 
reduced to the integer-valued setting by scaling the query output. 



10 



o(r) 



a(r)e 



a(r)e 



a(r)e 3£ 

Ml 



1111 



-3A 



-2A 



a(r)e 



a(r)e" 



11111 



a(r)e 3e 

m 



r A A + r 2A 2A + r 3A 



Fig. 5: The Staircase-Shaped Probability Mass Function V r (i) 



Let V be the probability mass function of the noise X, and thus V(i) denotes Pi[X = i}. Then it is easy to derive the 
e-differential privacy constraint on V, which is 



T(i) <e e V(i + d),VieZ,deZ, \d\ < \A\. 



(62) 



Consider a cost function £(•) : Z — > R, which is a function of the added noise X. Our goal is to minimize the expectation 
of the cost subject to the e-differential privacy constraint d62|: 



minimize C(i)V(i) 

i— — oo 



(63) 



subject to V(i) < e € T(i + d), Vi G Z, d G Z, |d| < |A|. 



It turns out that when the cost function £(•) is symmetric and monotonically increasing for x > 0, the solution to the above 
optimization problem is a discrete variant of the staircase mechanism in the continuous setting. 

B. Optimal Mechanism in the Discrete Setting 

As in the continuous setting, we also assume that the cost function £(•) is symmetric and monotonically increasing for 
x > 0, i.e., 

Property 3. 



£(») = £(-i),Vi e Z 

£(ar) <£(j/), Vo:,zjGZ,0 < a: < y. 



(64) 
(65) 



The easiest case is A = 1. In the case that A = 1, the solution is the geometric mechanism, which was proposed in J4). 



Theorem 10. If the cost function £(•) satisfies Property^and A = 1, then the geometric mechanism, which lias a probability 
mass function V with V{%) = y^j6'*',V« G Z, is the optimal solution to ( |63[ ). 

Proof: See Appendix [F] ■ 
For fixed general A > 2, consider a class of symmetric and staircase-shaped probability mass functions defined as follows. 
Given an integer 1 < r < A, denote V r as the probability mass function defined by 



'a(r) 



< i < r 
r < i < A 



v a) = < 6 e ° (r) 

* e- ke V r (i- kA) kA < i < (k+ 1)A for k e N 



(66) 



for i G Z, where 



a(r) = 



i < 



1 - b 



(67) 



2r + 26(A-r)-(l-6) 

It is easy to verify that for any 1 < r < A, V r is a valid probability mass function and it satisfies the e-differential privacy 
constraint d62l. 
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Let SV be the set of all probability mass functions which satisfy the e-differential privacy constraint ( |62| ). 
Theorem 11. For A > 2, if the cost function C{x) satisfies Property^ then 

+oo +oo 

inf V C(i)V(i) = min V C(i)VM)- (68) 

i— — oo z— — oo 

Proof: See Appendix [F] ■ 
Therefore, the optimal noise probability distribution to preserve e-differential privacy for integer-valued query function has a 
staircase-shaped probability mass function, which is specified by three parameters e, A and r* = arg min J2t=*^oo C(i)V r (i). 

{r£N|l<r<A} 

In the case A = 1, the staircase-shaped probability mass function is reduced to the geometric mechanism. 
C. Optimality Among All Randomized Mechanisms 



In this section, we extend the optimality result of the staircase mechanism in Section VI to the discrete setting 



In the discrete setting, a generic randomized mechanism JC is a family of probability measures over the domain Z, i.e., 

K. = : i G Z}, (69) 

where /i; is the probability measure of the randomized output when the query output is i G Z, and /i; is a discrete probability 
mass function over Z. 

The e-differential privacy constraint is that for any G Z such that \ii — i%\ < A, and for any subset S C Z, 

m^S) < e^S), (70) 

equivalently, Vj G Z, 

MiiC?) < e £ Mi 2 (i)- ( 71 ) 

The goal is to minimize the worst-case cost 

minimize sup V] £(j - (72) 
iez • — 

j = -oo 

subject to the differential privacy constraint f7Tj ). 

Given /ij, define a discrete probability mass function v\ via 

^(7') = /*iC7 + i),ViGZ. (73) 

One can view ^ as the probability measure of the noise added to the query output i. Then, equivalently, the differential privacy 
constraint f7T| is 

vM < e e u i2 (j + h - i 2 ),Vj € Z, \h - is | < A, (74) 
and the optimization problem is to minimize 



sup ]T (75) 

j = -oo 

We will prove that under a technical condition on the families of probability measures, the optimal mechanism is to add 
query-output-independent noise with staircase-shaped probability mass function. 
For any integer n > 1, define 

/C„ = { {Vi} i& %\ -jVJiez satisfies f74} , and f i+ „ = i^,Vi G Z}. (76) 

Theorem 12. Among the collection of families of probability measures in 

U n >i/C n , (77) 

f/ze optimal family of probability measures {^i}igR fo minimize sup ieZ Sj^-oo ^0)^0) & f ' 2flf ^ ^> ^ = ^V*> wh ere 

+oo 

r* = argmin £(i)"P r (z). (78) 

{r£N|l<r<A} i f_^, 



The proof is essentially the same as the proof of Theorem [8] and thus is omitted. 
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Appendix A 
Proof of TheoremQ] 

In this section, we give detailed and rigorous proof of Theorem [T] 



A. Outline of Proof 

The key idea of the proof is to use a sequence of probability distributions with piecewise constant probability density 
functions to approximate any probability distribution satisfying the differential privacy constraint ( fTT) . The proof consists of 8 
steps in total, and in each step we narrow down the set of probability distributions where the optimal probability distribution 
should lie in: 

• Step 1 proves that we only need to consider symmetric probability distributions. 

• Step 2 and Step 3 prove that we only need to consider probability distributions which have symmetric piecewise constant 
probability density functions. 

• Step 4 proves that we only need to consider those symmetric piecewise constant probability density functions which are 
monotonically decreasing for x > 0. 

• Step 5 proves that optimal probability density function should periodically decay. 

• Step 6, Step 7 and Step 8 prove that the optimal probability density function over the interval [0, A) is a step function, 
and they conclude the proof of Theorem [T] 



B. Step 1 



Define 




(79) 



Our goal is to prove that V* = inf f C(x)V^(dx). 
If V* = +oo, then due to the definition of V*, we have 




(80) 



and thus inf 7e [ .i] / R L{x) = V* = +oo. So we only need to consider the case V* < +oo, i.e., V* is finite. Therefore, in 
the rest of the proof, we assume V* is finite. 



First we prove that we only need to consider symmetric probability measures. 
Lemma 13. Given V £ SV, define a symmetric probability distribution V sym as 



Vsym(S) = " P ( 5 ) + ' P ( S \ y measurable set SCR, (81) 



where the set —S = {—x | x £ S}. Then V sym £ SV, i.e., V svm satisfies the differential privacy constraint ( |1 1[ ), and 




(82) 



Proof: It is easy to verify that "P sym is a valid probability distribution. Due to the definition of V sym in ( (8T| , we have 



for any measurable set SCR. Thus, P sym is a symmetric probability distribution. 

Next, we show that "P S ym satisfies ( fTT| >. Indeed, V measurable set S C K and V|d| < A, 




(83) 




(84) 



e'V{S + d) + e t T{-S-d) 
2 

e e P(S + d) + e/V(-(S + d)) 
2 



(85) 



(86) 



= e e V sym {S + d), 

where in f5]) we use the facts V(S) < e e V(S + d) and V(-S) < e e V(-S - d). 



(87) 
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Lastly, since C(x) is symmetric, 



C(x)V(dx) = I £ W+ £ ( x) V(dx) (88) 



C(x)V sym (dx). (89) 



5P sym 4 {V sym \V G <S7>}, (90) 



Therefore, if we define 

due to Lemma [T3| 
Lemma 14. 

7*= inf / £(x)P(dr). (91) 

C. Step 2 

Next we prove that for any probability distribution V satisfying differential privacy constraint ( fTT) , the probability Pr(X = 
x) = 0, Vie G R, and z]) ^ for any y < z G M. 

Lemma 15. VP G 5P,V^ G K, P({x}) = 0. And, for any y < zeR, V([y, z]) ^ 0. 

Proof: Given V G ST 5 , suppose ^({xo}) = Po > 0, for some xq G K. Then for any x G [x , x + A], 

T({x}) > e~\ (92) 

due to ( flT) . 

So is strictly lower bounded by a positive constant for uncountable number of x, and thus V([xo, xo + A]) = +oo, 

which contradicts with the fact V is a probability distribution. 
Therefore, VP G SV, Vr G M, P({4) = 0. 

Suppose V([y, z]) = for some y < z £ K.. Then from ( fTT| we have for any \d\ < A, 

P([y + d^ + d])<e £ P([t/,^)=0, (93) 

and thus V([y + d, z + d}) = 0. By induction, for any k E Z, V ([y + kd, z + kd]) = 0, which implies that V{(— oo, +oo)) = 0. 
Contradiction. So for any y < z e R, V{[y, z]) ^ 0. ■ 

D. Step 3 

In this subsection, we show that for any V G sSPsym with 



V(V) = / £(x)P(dx) < +oo, (94) 

we can use a sequence of probability measures \V% G SPsy,,,}^! with symmetric piecewise constant probability density 
functions to approximate V with lirrii_j. +00 V(Vi) = V(V). 

Lemma 16. Given V G SV sym with V(V) < +oo, any positive integer i G iV, define V% as the probability distribution with a 
symmetric probability density function fi(x) defined as 

LaWMI zG[fc£,(fc + i)£)/ rfcGN 



Then Vi G SV sxm and 



lim V(7><) = V(P). (96) 

i— >+oo 



Proq/: 

First we prove that 7^ G (SPsynn i- e -> Pi is symmetric and satisfies the differential privacy constraint ( [TT| . 
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By definition fi(x) is a symmetric and nonnegative function, and 



+00 



+00 



fi(x)dx = 2 fi(x)dx 



x£[0,+oo) 



V(dx) 



xG(0,+oo) 



V{dx) 



(97) 

(98) 

(99) 
(100) 



where in (|99J> we used the fact ^({O}) = due to Lemma 15 In addition, due to Lemma 15 ak > 0,VA; € N. 

So fi(x) is a valid symmetric probability density function, and thus Vi is a valid symmetric probability distribution. 
Define the density sequence of Vi as the sequence {qq, ai, 02, ... , a„, . . . }. Since V satisfies ( fTT| , it is easy to see that 



cij < e^cij+iz and aj + k < e £ Oj, Vj > 0, < k < i. 
Therefore, for any x, y such that \x — y\ < A, we have 

fi(x) < e e f t {y) and fi(y) < e e f t {x), 

which implies that Vi satisfies ( fTTj ). Hence, Vi G SV sym . 
Next we show that 

lim V(Vi) = V(V). 

i— >+co 

Since C(x) satisfies Property [5] we can assume there exists a constant B > such that 

C(x + 1) < B£(x),Vx > T. 
Given 8 > 0, since V(V) is finite, there exists integer T* > T such that 



For any integers i > 1, N > T*, 



xe[N,N+l) 



C(x)V(dx) < -. 



£(x)P,(dx) < p„([jV, JV + l))£(iV + 1) 



V([N,N + 1))C(N + 1) 



< 



BC{x)V{dx). 



Therefore, 



a;G[r*,+oo) 



C{x)V t {dx) < 



BC{x)V{dx) 



ce[T*,+oo) 

- B 

= 5. 

For x g [0,T*), is a bounded function, and thus by the definition of Riemann-Stieltjes integral, we have 



lim 

i— foo 



C{x)Vi{dx) = 



lx£[0,T') JxG{0,T*) 

So there exists a sufficiently large integer i* such that for all i > i* 



C(x)V(dx) 



G[0,T*) 



C{x)V l {dx) 



x£[0,T*) 



C(x)V{dx) 



< 5. 



(101) 
(102) 

(103) 
(104) 
(105) 

(106) 
(107) 
(108) 

(109) 

(HO) 
(111) 

(112) 
(113) 
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Hence, for alii > i* 

\V(Vi)-V(V)\ 



C{x)Vi(dx) - / C{x)V{dx) 



= 2 



< 2 



C(x)Vi(dx) - / £(x)T(dx) 



C{x)Vi{dx)- \ £{x)T{dx] 

lxe[Q,T») Jxe[0,T*) 

<2(5 + 6+~) 



B 



<(4+~)<5. 
Therefore, 



lim 

i— >-\-oo 



C(x)Vi(dx) - / £{x)T{dx) 



2 I £(x)T t (dx) + 2 

lxe[T',+co) Jx£[T*,+oo) 



C{x)Vi{dx) = / £{x)T{dx) 



(114) 
(115) 



(116) 

£{x)T{dx) (117) 

(118) 
(119) 

(120) 



Define STi^ ym = {Ti\T G ST sym } f° r i > 1, i.e., ST 'i iSym is the set of probability distributions satisfying differential 
privacy constraint (JTTJ and having symmetric piecewise constant (over intervals [k^-, (k + 1)^) ^k £ N ) probability density 
functions. 

Due to Lemma [TBI 



Lemma 17. 



V* = inf / £(x)T{dx). 

V&U^SVi,^ J xm 



(121) 



Therefore, to characterize V*, we only need to study probability distributions with symmetric and piecewise constant 
probability density functions. 



E. Step 4 

Next we show that indeed we only need to consider those probability distributions with symmetric piecewise constant 
probability density functions which are monotonically decreasing when x > 0. 

Lemma 18. Given T a € STi iSY ,„ with symmetric piecewise constant probability density function /(•), let {ao, cii, . . . , a n , . . . } 
be the density sequence of /(•), i.e, 



fix) =a k ,xe [fc-,(fc + l)-)VfceN. 



(122) 



Then we can construct a new probability distribution Tb S STi, S ym the probability density function of which is monotonically 
decreasing when x > 0, and 



£(x)T b (dx) < / £(x)T a (dx). 

xek Jxem 



Proof: Since a k > 0,Vfc e N, and 



we have lim^+oo a k = 0. 



k=Q 



A _ 1 

T ~ 2 : 



(123) 



(124) 
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Given the density sequence {ao, a%, . . . , a n , . . . }, construct a new monotonically decreasing density sequence {bo, b\, . . . , b n) . . . } 
and a bijective mapping it : N — > N as follows 

Iq = argmaxafe, (125) 
7r(0) = min n, i.e., the smallest element in Io, (126) 

nela 

bo = a„( - ) , (127) 

(128) 

Vm e N and m > 1, (129) 

I m = argmax a k , (130) 

ke®\{Tr(j)\j<m} 

7r(m) = min n,i.e., the smallest element in I m , (131) 

Kn = «7r(m)- (132) 

Since the sequence {a^} converges to 0, the maximum of {ak} always exists in ( |125| l and ( [130) . Therefore, I m is well 
defined for all m e N. 

Note that since X]fclo afc T = 1 an d me sec l uence {bfc}fceN is simply a permutation of {afc}fc S N, SfcLi^fcx = \- 
Therefore, if we define a function g(-) as 

ffe fc xe [fc^,(fc + l)^)forfceN 

9W= {; ( -x) .<o" <m) 

then <7(-) is a valid symmetric probability density function, and 

C{x)g{x)dx < I £(x)f(x)dx. (134) 



Next, we prove that the probability distribution Vb with probability density function g(-) satisfies the differential privacy 
constraint ( fTT) . Since is a monotonically decreasing sequence, it is sufficient and necessary to prove that for all k G N, 

P~ < e e - (135) 

Ok+i 

To simplify notation, given fc, we define 

a*(k) = min ak, (136) 

k<j<k+i 

i.e., a*(fc) denotes the smallest number of {ak, ak+i, ■ ■ ■ , a/c+i}- 

First, when k = 0, it is easy to prove that I 1 < e e . Indeed, recall that bo — a^o) and consider the i + 1 consecutive numbers 
{a x (o), a 7r(o)+i> ■ ■ ■ : a 7r(o)+i} i n me original sequence {a fc } fceN . Then a*(0) < bi, since 6, is the (i + l)th largest number in 
the sequence {afc}fc S N- Therefore, 

f T = a f 1 < a m ) <^ (137) 
bi h a* (0) 

For k = 1, b\ = awi) and consider the i + 1 consecutive numbers {a^^, a^-m+i, . . . , a^i)^}. If 7r(0) ^ [7r(l), 7r(l) + £], 
then a*(7r(l)) < b i+1 , and thus 

t-^ = t^ 1 < ^f^rr < e e . (138) 
Oi+i b 1+l a*(7r(l)) 

If tt(0) e [tt(1), tt(1) + i], then o*(tt(0)) < b i+1 and a ° ( ; ( (°o)) < e e . Therefore, 

^ bo ^ bo e „, m 

^^^^Ro))^ e - (139) 

Hence, 5^*- < e £ holds for fc = 1. 

In general, given fc, we prove g^- < e e as follows. First, if ttj ^ [7r(fc),7r(fc) + z] , Vj < fc, then a*7r(fc) < b^+i, and hence 



<M*) < < ge _ (140) 



bi+fc b i+/c a*(7r(fc)) 

If there exists j < k and 7Tj g br(fc) + l)Tr(fc) + i], we use Algorithm [2] to compute a number j* such that j* < fc and 

7r,-g [7r(i*) + l,7r(i*)+i],Vi<fc. 
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Algorithm 2 

3*<-k 

while there exists some j < k and nj G [n(j*) + l,n(j*) + i] do 

J <- 3 

end while 

Output j* 



It is easy to show that the loop in Algorithm [2] will terminate after at most fc steps. 
After finding j*, we have j* < k, and a*(ir(j*)) < b k+l . Therefore 



bk <^m< a ^\) (141) 



h+k b l+k a*(ir(j*)) 



So ^ < e £ holds for all k e N. Therefore, P 6 e SP; 



This completes the proof of Lemma 18 



Therefore, if we define 

SVi t mA — {P\V G SVi tS ym> an d the density sequence of is monotonically decreasing}, (142) 
then due to Lemma [T8| 
Lemma 19. 

V* = inf f C(x)V(dx). (143) 

F. Step 5 

Next we show that among all symmetric piecewise constant probability density functions, we only need to consider those 
which are periodically decaying. 

More precisely, given positive integer i, 

SVi^A = {V\V G SVi t mA, and V has density sequence {ao, ax, ... , a„, . . . , } satisfying gfc = e e , Vfc G N}, (144) 

then 

Lemma 20. 



V* = inf / £(a;)7>(da:). (145) 

veu? =1 sv t ,,J xeR 



Proof: Due to Lemma 19 we only need to consider probability distributions with symmetric and piecewise constant 
probability density functions which are monotonically decreasing for x > 0. 

We first show that given V a G SVi iSn d with density sequence {0,0,0,1, . . . , a n , ...,}, if — < e c , then we can construct a 
probability distributions Vb G SV^md with density sequence {60, • • • , b n , . . . , } such that = e e and 

V{V a ) > V(P b ). (146) 

Define a new sequence {bo, b\, . . . , b n , . . . } by scaling up ao and scaling down {01,02,...}. More precisely, let 5 = 

„ n ,, i \ _.a , — r - 1 > 0, and set 

b = a {l + S), (147) 

b k =a k (l-6'),Vk>l, (148) 

where 5' = ?° s > 0, and we have chosen 5 such that £a = ^ ro-°o = p£ 

ro -10 °i a k 2D(1+J) _a 

It is easy to see the sequence {60, 61, . . . , b n , . . . , } correspond to a valid probability density function and it also satisfies 
the differential privacy constraint ( fTTj ), i.e., 

/ '' <e e ,Vfc>0. (149) 



bk+i 

Let Vb be the probability distribution with {bo, b\, . . . , b n , . . . , } as the density sequence of its probability density function 
Next we show V(V b ) < V(P a ). 
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It is easy to compute V(V a ), which is 



V{V a ) = 2— [ a J L{x)dx + 2_ j a k J C{x)di 
Similarly, we can compute V(Vb) by 



(150) 



k=l 



V(P 6 ) = 2- ( & / C(x)dx + Y,b k I a £(aj)da:j (151) 



fe=i 

A / fT ~ /•( fc + 1 )f \ 

= V(P a ) + 2— a <5 / C{x)dx-8'^2a k I C{x)dx\ (152) 

* V ^° fc=l ^ fc T / 



V(V a ) + 2^—^ Va fc / * C(x)dx- V« t / £(:r)cfe (153) 

1 vk -ao Jo tri A#- ) 



\k=l " u fe=l 



V(Va) + 2^^ °* ( / C ^ dx - / ' C ^ dx ( 154 > 



Z 2A ~ a k=1 \Jo Jk^ 



< V{V a ), (155) 
J 4 C{x)dx — J A 4 C{x)dx\ < 0, since £(■) is a monotonically increasing 

function for x > 0. 

Therefore, for given i € N, we only need to consider "P e ST-^md with density sequence {ao, ai, . . . , a n , . . . } satisfying 



2& = e e 



Next, we argue that among all probability distributions V G SVi^d with density sequence {ao, a\, . . . , a n , . . . , } satisfying 
^ = e e , we only need to consider those probability distributions with density sequence also satisfying — e e . 

Given V a £ SVi_ m A with density sequence {ao, <2i, . . . , a„, . . . } satisfying ^ = e e and < e e , we can construct a new 
probability distribution V b e cvPi,m<j with density sequence {6 , &i, • • • , b n , . . . } satisfying 

jW, (156) 
7^ = e e , (157) 

Oi+l 

and F(P ) > V(V b ). 

First, it is easy to see a x is strictly less than ao, since if a = Oi, then = -^-^ > ^ = e c . Then we construct a new 
density sequence by increasing ax and decreasing a i+1 . More precisely, we define a new sequence {bo, bi, . . . ,b n , . . .} as 

& fc =a fc ,VMl.M*+l, (158) 
6i=oi + «, (159) 
6 i+ i = a 4+ i - J, (160) 

where <5 = 6 a '+ 1 ~ ai anc [ thus t^ 3 — = e e . 

It is easy to verify that {bo,b\, . . . ,b n , . . .} is a valid probability density sequence and the corresponding probability 
distribution V b satisfies the differential privacy constraint ( flT) . Moreover, V(V a ) > ^("P;,). Therefore, we only need to 
consider V E S7\ m d with density sequences {a , Oi, • • • , a n , . . . } satisfying ^ = e £ and = e e . 

Use the same argument, we can show that we only need to consider V 6 57^, m d with density sequences {ao, ai, . . . , a n , . . . } 
satisfying 



Therefore, 



= e e ,Vfc>0. (161) 



V* = inf / C(x)V(dx). (162) 



Due to Lemma 20 we only need to consider probability distribution with symmetric, monotonically decreasing (for x > 
0), and periodically decaying piecewise constant probability density function. Because of the properties of symmetry and 
periodically decaying, for this class of probability distributions, the probability density function over E is completely determined 
by the probability density function over the interval [0, A). 

Next, we study what the optimal probability density function should be over the interval [0, A). It turns out that the optimal 
probability density function over the interval [0, A) is a step function. We use the following three steps to prove this result. 
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G. Step 6 

Lemma 21. Consider a probability distribution V a S SV \_ p d (i > 2) w/f/z density sequence {ao, ai, . . . , a„, . . . }, anrf ^t 2 ^ < 
e e . r/zen there exists a probability distribution Vb € SVi, p d with density sequence {bo, b\, . . . , b n , . . . }such that ■^ s - — e e , and 

V(V b ) < V(V a ). (163) 

Proof: 

For each < fc < (i — 1), define 

Wfc 4 VV^ / £(a:)da;. (164) 

Since C(cdot) satisfies Property [5] and V™ < oo, it is easy to show that the sum of series in ( |164| i exists and is finite, and 
thus Wk is well defined for all < k < (i — 1). In addition, it is easy to see 

Wq < W\ < W2 < ' ' ' < Wi-i, (165) 

since C(x) is a monotonically increasing function when x > 0. 
Then 

i-l 

^w fe a fc . (166) 

fe=0 

Since -^-^ < e e , we can scale ao up and scale {a\, . . . , et;_i} down to derive a new valid probability density function with 
smaller cost. More precisely, define a new probability measure Vb € SVi. p d with density sequence {bo, bi, . . . , b n , . . . } via 

bo - 7 a o, (167) 

& fe ^7'ajfc,Vl< k<i-l, (168) 

for some 7 > 1 and 7' < 1 such that 

A- = e*. (169) 

Ot-X 

To make {bo, bi, . . . , b n , . . . } be a valid density sequence, i.e., to make the integral of the corresponding probability density 
function over K be 1, we have 

i>=:e>= ( i7 o) 

fc=0 fc=0 

Define i = ^f 1 ^, then we have two linear equations on 7 and 7': 



« 2—1 

K(P Q ) = / £(a;)7> o (da0=2$y. 



ia = e e i (171) 
1 a + 1 '(t-a Q )=t. (172) 



From ( |171[ ) and ( |172| i, we can easily get 

a (t ~ a + e £ ai_i) 
t 

t — a a + e € di-i 



1=—, —< K> 1 ( 17 3) 

7' - : *— < l. (174) 
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Then we can verify that the V(V a ) > V(V a )- Indeed, 

V{V a ) - V(T b ) (175) 
C(x)V a (dx) - [ £{x)V b (dx) (176) 

i-1 i-1 



2^ w k a k - 2^ w k b k (177) 

k=0 k=0 

2((l- 1 )w ao + {l-y)J2w k a k ] (178) 



fe=i 

i-1 



>2^(l- 7 )woao + (l-Y)53t£toa*J ( 179 ) 

= 2 ((1 - 7 ) Wo ao + (1 - 7>o(i - ao)) (180) 

= 2w Q a -- ■ — \-(t-a )- ■ — (181) 

\ t — clq -\- e CLi—i t — ao + e £ ai_i / 

= 0. (182) 

This completes the proof. 



Therefore, due to Lemma 21 for all i > 2, we only need to consider probability distributions V 6 SVi. p d with density 
sequence {ao, ai, . . . , a„, . . . {"satisfying = e e . 
More precisely, define 

SVi.fr = {P G SPi.pdlT 3 nas density sequence {a , ai, . . . , a n , ■ ■ ■} satisfying —5- = e e }. (183) 

a i-i 

Then due to Lemma [2T| 
Lemma 22. 



V* = inf / £(a;)7>(aV). (184) 

H. Step 7 

Next, we argue that for each probability distribution V € SV%^ (i > 3) with density sequence {ao, ai, . . . , a n , . . . }, we can 
assume that there exists an integer 1 < k < (i — 2), such that 

aj = a , VO < j < k, (185) 

<Zj = ai_i,Vfc < j < i. (186) 

More precisely, 

Lemma 23. Consider a probability distribution V a G SVij r (i > 3) w/f/i density sequence {ao, ffli, • • • , a n , . . . }. 77;en f/zere 
exwfs a probability distribution V b € SVij r with density sequence {bo, b\, . . . , 6 n , . . . } smc/i f/;af f/zere ex/sfs an integer 
1 < fc < (i - 2) with 

bj : = a o ,V0 < j < fc, (187) 

bj = a,i-i,V k < j < i, (188) 

one/ 

F(P 6 ) < (189) 
Proof: If there exists integer 1 < fc < (i — 2) such that 

a, . = a o ,V0 < j < fc, (190) 
aj = o»_i,V k < j < i, (191) 

then we can set Pf, = V a . 

Otherwise, let fej be the smallest integer in {0,1,2, ... ,i — 1} such that 

ojfei 7^ ao, (192) 
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and let fc 2 be the biggest integer in {0, 1, 2, . . . , i — 1} such that 

a k2 ^ di-i. (193) 

It is easy to see that fcj ^ fc 2 . Then we can increase a kl and decrease a k2 simultaneously by the same amount to derive a 
new probability distribution Vb G SVifi- with smaller cost. Indeed, if 

ao - a kl < a k . 2 - Oi-i, (194) 

then consider a probability distribution Vb G SVi.fr with density sequence {bo, 61, ... , . . . } defined as 

bj = a ,VO < j < kx, (195) 

bj =a j ,Vk 1 <j <k 2 -l, (196) 

&fe 2 = afe 2 - (ao - a kl ), (197) 

bj=aj,^k 2 <j<i-l. (198) 

We can verify that V(7> ) > T/(7 5 f) ) via 

V(V a ) - V(7> 6 ) (199) 

C(x)V a {dx)- { C{x)V b {dx) (200) 



= 2(w fel 6 fcl + w k2 b k2 ) - 2(w kl a kl + w k2 a k2 ) (201) 

= 2w fel (a - a kl ) + 2w fc2 (ofc 2 - (a - a fel ) - a fe2 ) (202) 

= 2 (a - a kl )(w kl - w k2 ) (203) 

< 0, (204) 

where itfj is defined in ( |164| i. 

If ao — a kl > a k2 — di-x, then accordingly we can construct Vb G SVi.h by setting 

bj = a ,V0 < j < k u (205) 

fefci = a kl + {a k2 -ai-x), (206) 

bj = aj , Vfci < j < fe 2 - 1, (207) 

o 3 = ai_ x , Vfc 2 < j < i - 1. (208) 

And similarly, it is easy to verify that V(V a ) > V(Vb)- 

Therefore, continue in this way, and finally we will obtain a probability distribution Vb G SVi.h with density sequence 
{fro, fri, ■■.,&»,■■■ } such that (fTF7>, (fT88|) and <fT89|> hold. 
This completes the proof. 

■ 

Define 

SV^step = {V G SVifi I V has density sequence {ao, ax, ■ ■ ■ , a n , . . . } satisfying( |187| > and ( |188| l for some 1 < k < (i — 2)}. 

(209) 

Then due to Lemma [23| 
Lemma 24. 

V* = inf / C(x)V(dx). (210) 



I. Step 8 

Proof of Theorem^ Since {7^17 G [0, 1]} C SV, we have 



V* = inf / C(x)V(dx) < inf / £(xYP~(da;). (211) 

"P^SPJx&t 76[0,l]y a , 6R 



We prove the reverse direction in the following. 

We first prove that for any "P G <S"Pi, s t e p ( i > 3), there exists 7 G [0, 1] such that 



C^V^dx) < / £(a;)7>(da;). (212) 
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Consider the density sequence {a , ai, . . . , a n , . . . } of V . Since "P <E <S7\ step , there exists an integer < k < i — 2 such 
that 

a 3 ■= a ,V0 < j < fc, (213) 
a, ■= a e~ £ ,Vfc < j < i - 1. (214) 

Let 

y ^" a °^ €[0,1]. (215) 
a (l - e e ) 

Then 0(7') = a . 

It is easy to verify that 

A , A 

k— < 7'A < (k + 1) — . (216) 

The probability density functions of V and "Py are the same when x € [0, f A)U [^4^ A, A). Since the integral of probability 
density functions over [0, A) is — | — due to the periodically decaying property, we have 

a k - = a ( 7 ' - -)A + e - £ a (— - Y)A. (217) 
11 1 

Define /3 = i{i - f ) e [0, 1]. Then 

a fc = /3a + (1 - /3)e-'a . (218) 

Define 

+00 / -a+ 7 ')A 

w i - Y] e ~ je / C(x)dx, (219) 

+00 (j + thi)A 

tu^ 4 V e~ J£ / C(x)dx, . (220) 

Note that = wj^ + wj^. Since is a monotonically increasing function when x > 0, we have 



w 



(2) 



(j + Mi)A-Q- + y)A _ 



W W - (j + 7')A-(j + f)A " 7 '-f 



Therefore, 



/ C{x)V(dx)- [ C{x)V y (dx) (222) 

=2w k a k - 2 (w^ao + w i 2)a oe~ £ ) (223) 

=2 (w^ + wf } ) a fc - 2 (tu^ao + 4 2) «oe- £ ) (224) 

=2(a fc - a e _e )w[ 2 ) - 2(a - a fc )w;^, 1) . (225) 



Since 



a k - a e £ (3(a ~ a e £ ) 



a - a k (1 - P)(a - a e e ) 

(3 

1-/9 



4 1} 



(226) 
(227) 
(228) 



> -12) > (229) 



we have 



C{x)V{dx)- \ £(x)T Y (dx) 

Jx&S. 

-2(a k - a e" e )wf - 1 - 2(a - a k )w^ 



>0. 



Therefore, 



=3<5A»,step 



> inf / £(x)T>Jdx). 

76[0,l]J KeR 



We conclude 

This completes the proof of Theorem [T] 

Appendix B 
Proof of Theorem|2] 

Proof of Theorem |2| 
Recall 6 = e~ e , and C{x) — \x\. We can compute V("P 7 ) via 



cc/ 7 (a;)(ia; 

+°° / /•7A /-A > 

2 V / {x + kA)a(j)e- ke dx+ {x + kA)a(-f)e- e e- ke dx 

fc=0 V° J j A j 



k Ak + 1 y_ k 2 ) Ak + 1 y_ {k + 1 y 

2 + 2 



+°° / fl. 1 „A2 1.2 fz, 1 t\2 

2A 2 a(7)E e 

fe=0 ^ 

2A»a( 7 ) £ ((i, + (1 - f,b)fe-" + ^'Y^ e-") 



(1-6) 2 2 1-6 



1 - 6 2 6 + (1-6)7 



= A 

where in ( |242[ > we use the formulas 

E fofc 

fe=i 
+00 

E^ fc 
fc=i 

Note that the first term is independent of 7. Define 



2A(6+ (1 - 6)7) V /,7 (l-6) 2 2 1-6 

6 16+(l-6) 7 2 



1 



1-6' 
b 



3(7) = 



(1-6)2- 
6+(l-6) 7 2 



b +(l-6) 7 ' 

and thus to minimize ViJ 3 ^) over 7 G [0, 1], we only need to minimize 3(7) over 7 g [0, 1]. 
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Note that the first term ^fgp is independent of 7. Define 

A 6+(l-6) 7 2 6 , 1 6+(l-6) 7 3 
M7) = 6 + (l-6) 7 T3fc + 3 6+(l-6) 7 (265) 

^+&7 2 + t^ + | 

= rrTT h\ ■ (266) 

+ (1 — 6)7 

and thus to minimize V A (7 : ' 7 ) over 7 e [0, 1], we only need to minimize ft (7) over 7 G [0, 1]. 

Since 7 e [0, 1], h(j) < ^ + |. Also note that ft(0) = ft(l) = ^ + |. So the optimal 7* which minimize h(j) lies in 

(0,1). 

Compute the derivative of ^(7) via 

ull . ((1 - 6)7 2 + 26 7 )(6 + (1 - 6)7) -(¥7 3 + &7 2 + t^ + §)(!-&) 



|(1 - 6) 2 7 3 + 26(1 - 6) 7 2 + 26 2 7 - 



(6 +(1-6)7)2 
3 



(6+ (l-6)7) 2 

Set ft' (7*) = and we get 



(268) 
(269) 



|(1 _ & )2 7 *3 + 26(1 _ b)j *2 + 2b 2 Y _ 26 2 + 6 = Q (2?o) 

Therefore, the optimal 7* is the real-valued root of the cubic equation ( |270| i, which is 

6 (6 - 26 2 + 26 4 -6 5 ) 1 / 3 
7 ="T^6 + 2V3(l-6)2 ' (271) 

We plot 7* as a function of 6 in Figure [4] and we can see 7* — > | as e — > 0, and 7* — > as e — > +00. This also holds in 
the case C(x) = \x\. 

Plug ( |271[ ) into ( j263| >, and we get the minimum noise power 

6 2 + 6 6+(l-6) 7 * 2 6 16+(l-6) 7 * 3 



V ^ = A ( (1^6)2- + 6 +(1-6)7- 1^6 + 3 6+ (1-6)7* ' ^ 



2 2-2/3 6 2/3 (1 + fe) 2/3 +6 



O^F ' (273) 



Due to Theorem jlj the minimum expectation of noise power is V"("P 7 *) = A 



_ A2 2- 2 / 3 b 2 / 3 (l+6) 2 / 3 +b 

a-b) 2 



Appendix D 
Proof of Theorem|6] 

Proof of Theorem^ Let n = m + 1, and define 

+00 

c^^W, (274) 

fc=0 

for nonnegative integer i. 
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First we compute V r (7 5 7 ) via 



V{VJ = 2 V / (ar + /cA) m a( 7 )e^ fce dx + / (as + JfeAf'afrJe-^^da: (275) 

= 2a( 7 ) g (V (fc + ^ )m+1 - fcm+1 + tfH-i ( fe + - ( fe + 7)" +1 \ (2?6) 

^ V m + 1 m + 1 ' 



+00 



2 A"a( 7 ) V f 6 fc E ^ ^ 7 ' fc " ' + 66* Skills ' ] (277) 

2A"a( 7 ) (e (")^ c "-' +b jr (")( 1 ~^) c »-A (278) 



\i=l 
n (n\ 



2AM7)E ( ' )C "" (7t(1 ' 6) + b) (^9) 



2A"(l-6) EIU (7)c n - i ( 7 '(l-b)+. 
2An 7(1 - b) + b 



(280) 



Let /ii( 7 ) = Zf^i-fy+b ^ 0T * — 2- Since /i;(0) = = 1 and ^(7) < 1 for 7 g (0, 1), /ij(7) achieves the minimum value 
in the open interval (0, 1). 

Therefore, if we define ^(7) = ^-~' =1 b ^ +b \ the optimal 7* g [0, 1], which minimizes 1^(7-^), should satisfy 

/i'(7*) = 0, (281) 

where /i'(-) denotes the first order derivative of h(-). 
It is straightforward to derive the expression for h'(-): 

.,, , (ELi SK-jj7^_q - fc))(7(i - 6) + b) - (1 - 6) C)c n ^(Y(i -b) + b) 

h (7) = ( 7 (i -b) + by (282) 

= Sr=i (IK-Mi - b) 2 + gLx (^cn-i^-Hi - b)b Eti titn-oHi - bf - C)c»-,b(i - &) 

( 7 (l-6)+6) 2 



(283) 
(284) 



(7(1 -6) + 6)2 

Therefore, 7* should make the numerator of ( |284| i be zero, i.e., 7* satisfies 

E Q c„_,(* - 1) 7 4 (1 - 6 ) 2 + E (") c n-in i_1 (l - 6)6 - E (") c «-^(! - fo ) = 0- (285) 

Since 

E (j) "-^ - ~ b)2 + f^ Q^-i^-^l - 6)6- E Qcn-iKl - fo ) ( 286 ) 

= E (") ~ ^^(l - 6) 2 + E (< + 1) c n-(i+i)(i + l)7 l (l - b)b - E (") c*-i&(l - 6) (287) 

=c (n - 1) 7 "(1 - bf + E ((")c»-<(i - !)(! - fo ) 2 + (, + x ) ^-(W* + ~ V 



i=l 



+ ncn_i(l-6)6-E ( Jcn-ife(l-b) (288) 

i=l 

=c (n - 1) 7 "(1 - 6) 2 + E ((^f) c n-i(i ~ - ^) 2 + ( ? " ^-(W* + 1)(1 - 7 4 ~ E (") c «-^(l - 6), 



(289) 
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7* satisfies 



c Q (n 1) 7 *"(1 - bf + ( ( .Jcn-iii ~ 1)(1 - 6) 2 + ( . + x )c„-(, +1 )(< + 1)(1 - b)bj 7 * ! - ]T M ^6(1 - &) = 0. 

(290) 

We can derive the asymptotic properties of 7* from ( |290| >. Before deriving the properties of 7*, we first study the asymptotic 
properties of c$, which are functions of b. 

There are closed-form formulas for Cj (i=0,l,2,3): 



In general, for i > 1, 



Therefore, 



and thus 



+00 1 



ci 



C2 



C3 



+00 +00 +00 



fc=0 fe=l 



+00 



fc=l 



&+E& fe+1 Ef)>' < 2 ") 



fc=l j=0 

+00 



(291) 



fc=0 

+00 , 

E^ = tt^ (292) 

(295) 



cj+i =^b k k l+1 =J2b k k l+1 = b + J2b k+1 (k + l) t+ \ (296) 

fe=0 k=l fc=l 

6 Cl+1 = b k+1 k l+1 = b k+1 k* +1 . (297) 



a +l - bc l+1 =b + J2 b k+1 {(k + - (298) 



5+6 Er +1 )E F6 ' c ( 3 °°) 



j=Q N J ' fe=l 



From P03[ ), by induction we can easily prove that 

• as b -> 0, Cj -> 0, Vz > 1; 

• as 6 — > 1, Vi > 0, c,i — ^ +00, Q = £)( ^^^i+i ) and 

lim— (1-6) =i + l. (304) 

As 6 — ?> 0, since Cj — ?> for i > 1 and c = 1, the last two terms of ( |290| > go to zero, and thus from ( |290| l we can see that 
7* goes to zero as well. 
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As b — > 1, since c$ = £7( ^_ly +1 ) and 7* is bounded by 1, the first term of ( |290| > goes to zero, and the dominated terms in 
d290b are 

2 jc„_ 2 2(l - b)b 7 * - f 2 Jc„_ 2 6(l -6) = 0. (305) 

Thus, in the limit we have 7* = i. Therefore, as 6 — > 1, 7* — > i. 

This completes the proof. ■ 

Appendix E 
Proof of Theorem[8J 

We first give two lemmas on the properties of {vt}tem. which satisfies ( |48j ). 

Lemma 25. Given {ft}teR satisfying ( |48| ), and given any scalar a £ K, consider the family of noise probability measures 
Wt }te« defined by 

= u t+a ,yteR. (306) 
77ze« {^t" }teK satisfies the differential privacy constraint, i.e., V|ti — i 2 | < A, 

v[«\S) <e^\s + h-t 2 ). (307) 
Furthermore, {vt\tes. and {u^} t ^TR have the same cost, i.e., 

sup / L(x)v t {dx) — sup / ^(a;)^"^^). (308) 

tGR AgK tGR JicGR 

Proof: Since by definition the family of probability measures {v[°^}tes. is a shifted version of {vt}tem, ( j308| > holds. 
Next we show that {v\ a ^}teR satisfies ( |307| >. Given any £i,t 2 such that \t\ — £ 2 | < A, then for any measurable set Scl, 
we have 

4? = is tl+a (S) (309) 

< e e v t2+a (S + (h +a) - O2 +a)) (310) 

= e e z/ t2+Q (5 + ti -t 2 ) (311) 

= e e 4 a) (^ + ti-<2). (312) 

This completes the proof. ■ 
Next we show that given a collection of families of probability measures each of which satisfies the differential privacy 

constraint ( |48] >, we can take a convex combination of them to construct a new family of probability measures satisfying ( |4"8j ) 

and the new cost is not worse. More precisely, 

Lemma 26. Given a collection of finite number of families of probability measures {ff^teM. (i & {lj 2, 3, ... , n}), such that 
for each i, {i/]^} te R satisfies ( |48| l and 

sup/ C{x)vf{dx) = Q,Vi, (313) 

tGR Jx£R 

for some real number Q, consider the family of probability measures {vt}teR defined by 

n 

h^j2 c ^t ] yt^^ (314) 
»=i 

i.e., for any measurable set ScM, 

n 

v t (S) = J2wt ] (S)> (315) 

z=l 

where Ci > 0, ant/ Si=i c « = 1- 

77ze« {t'tltgB flfao satisfies the differential privacy constraint (RSI, ana" 



sup / C{x)v t {dx) < Q. (316) 

tGR Jx£WL 
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Proof: First we show that {pt}t£M. also satisfies the differential privacy constraint ( |48| >. Indeed, V | ti — *2 1 < A, V 
measurable set Scl, 

n 

v tl (S) = Y, c ^t}(S) (317) 

2=1 

n 

<Y, c ^ v t}{S + h-t 2 ) (318) 

■i=i 

= e'h 2 {S + h -t 2 ). (319) 
Next we show that the cost of {i>t}tem. is no bigger than Q. Indeed, for any t € K, 

/ C{x)v t {dx) = ^c, [ C{x)vf ] {dx) (320) 
Jxes. i=1 Jxer 

n 

<^c,Q (321) 

i=l 

= Q. (322) 

Therefore, 

sup / C(x)i> t (dx) < Q. (323) 

*GR Jx&M 

m 

Applying Lemma [25] and Lemma [26] we can prove the conjecture under the assumption that the family of probability 
measures {^}t S R is piecewise constant and periodic over t. 

Proof of Theorem^ We first prove that for any family of probability measures -jVtjtgR G /Cr,n, there exists a new 
family of probability measures {£t}tgK € ICt.u such that v t — v for all t £ M, i.e., the added noise is independent of query 
output t, and 

sup / C(x)D t (dx) < sup / C(x)v t {dx). (324) 

*GR JxeK ten Jx&SL 

Indeed, consider the collection of probability measures {z/f' "HtgR f° r i G {0, 1, 2, . . . ,n — 1}, where {fi™^} is defined in 
for all i, {v t " }teM satisfies the differential privacy constraint ( |48| l, and the cost is the same as the 



( |306[ ). Due to Lemma 
cost of {v t }teR- 
Define 
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n— 1 1 

St = r^. (325) 
' n 
t=0 



Then due to Lemma 26 {vt}teR satisfies ( |48| i, and the cost of is not worse, i.e., 

sup / C(x)&t(dx) < sup / C{x)v t (dx). (326) 

tGR Jx&& tGR JzGR 

Furthermore, since {vt}te& & K-T,m for any t € K, 

n— l Ti—l 1 

^— ' n ^-^ n " 

i=0 i=0 

Hence, /v t is independent of i. 

Therefore, among the collection of probability measures in Uj->o U rl >i ICt.u, to minimize the cost we only need to consider 
the families of noise probability measures which are independent of the query output t. Then due to Theorem [T] the staircase 
mechanism is optimal among all query-output-independent noise-adding mechanisms. This completes the proof of Theorem [8] 



Appendix F 
Proof of Theorem [Tol and Theorem[TTI 



In this section, we prove Theorem 10 and Theorem 11 which give the optimal noise-adding mechanisms in the discrete 
setting. 
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A. Outline of Proof 

The proof technique is very similar to the proof in the continuous settings in Appendix [A] The proof consists of 5 steps in 
total, and in each step we narrow down the set of probability distributions where the optimal probability distribution should 
lie in: 

• Step 1 proves that we only need to consider probability mass functions which are monotonically increasing for i < and 
monotonically decreasing for i > 0. 

• Step 2 proves that we only need to consider symmetric probability mass functions. 

• Step 3 proves that we only need to consider symmetric probability mass functions which have periodic and geometric 
decay for i > 0, and this proves Theorem [T0| 

• Step 4 and Step 5 prove that the optimal probability mass function over the interval [0, A) is a discrete step function, and 
they conclude the proof of Theorem 1 1 



B. Step 1 

Recall SV denotes the set of all probability mass functions which satisfy the e-differential privacy constraint ( |62"] i. Define 

V* = inf V C(i)V(i). (328) 

i— — oo 

First we prove that we only need to consider probability mass functions which are monotonically increasing for i < and 
monotonically decreasing for i > 0. 
Define 

SV mono = {V e SV\V{i) < V{j),V{m) > V{n),Vi < j < 0,0 < m < n}. (329) 

Lemma 27. 

V* = inf V C(i)V(i). (330) 
vesr,„„,„, t-^ 

i— — oo 

Proof: We will prove that given a probability mass function V a € ST, we can construct a new probability mass function 
Vb € SV mono such that 

+oo +oo 

]T C(i)V a (i) > WPb(i)- (331) 

i— — oo i— — oo 

Given V a G SV, consider the sequence sa = { r P a (0),V a (^),V a (—l),V a (2),V a (—2),...}. Use the same argument in 
Lemma 15 and we can show V a (i) > 0,Vi € Z. Let the sequence sb = {bo, b\, b<z, b-2> ■ ■ ■ } be a permutation of the 
sequence sa in descending order. Since J^t^oo ^oCO = 1> li m i^-oo "P a {i) = limi_j. +00 V a {i) = 0, and thus sb is well 
defined. Let 7r be the corresponding permutation mapping, i.e., it : Z — > Z, and 

=7>aWi)). (332) 

Since £(■) is a symmetric function and monotonically decreasing for i > 0, we have 

£(0) < £(1) < £(-1) < C{2) < C{-2) <■■■< C(i) < C(-i) < C(i + 1) < C{-(i + !))<■■■. (333) 

Therefore, if we define a probability mass function Vb with 

n(*) = &4.VteZ, (334) 

then 

+oc> +oo 

J] AW*) > E (335) 



Next, we only need to prove T 5 ;, G ^ST^mono, i.e., we need to show that Vb satisfies the differential privacy constraint ( [62) , 
Due to the way how we construct the sequence sb, we have 

b > bi > b 2 > b 3 > ■ ■ ■ , (336) 
b a > 6-i > 6-a > 6-3 > • • ■ • (337) 
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Therefore, it is both sufficient and necessary to prove that 



<e e ,Vi> 0, (338) 
h+A 

-^<e e ,Vi< 0. (339) 

Oj-A 

Since V a € SV, Vi € {tt(0) - A,tt(0) - A + 1,tt(0) - A + 2, . . . , tt(0) + A}, 

^S°» < (340, 

Therefore, in the sequence sb there exist at least 2A elements which are no smaller than boe~ e . Since 6_a and b& are the 
2Ath and (2A — l)th largest elements in the sequence sb other than bo, we have y°-^ < e e and |^ < e e . 

In general, given i E Z, we can use Algorithm [3] to find at least 2 A elements in the sequence sb which are no bigger than 
bi and no smaller than bie~ e . 

More precisely, given i € Z, let j'^ and j£ be the output of Algorithm [3] Note that since the while loops in Algorithm [3] can 
take only at most 2(|i| + 1) steps, the algorithm will always terminate. For all integers j 6 [^(jl) ~ ^"^(Jl) ~ ■"•]> ^ov) is 
no bigger than 6j and is no smaller than V a {jj^e~ e \ and for all integers j € [ttC/r) + T^Jr) + A], V a (j) is no bigger than 
6 4 and is no smaUer than T a {j R )e^. Since V a {j* R ),V a {f L ) > h, for all j e [7r(j£) - A, n(j* L ) - 1] U [ttO'S) + 1, + A], 

V a (j) is no bigger than and is no smaller than bie~ e . Therefore, there exist at least 2A elements in the sequence sb which 
are no bigger than bi and no smaller than fr^e -6 . 

If i < 0, then fe^-A is the 2Ath largest element in the sequence sb which is no bigger than 6j and no smaller than bie~ e ; 
and if i > 0, then is the (2A — l)th largest element in the sequence sb which is no bigger than bi and no smaller than 
bie~ e . Therefore, we have 

<e e ,Vi>0, (341) 
<e e ,Vi< 0. (342) 



bi+A 
b, 



-A 



This completes the proof of Lemma 27 



Algorithm 3 



while there exists some j which appears before i in the sequence {0, 1, —1, 2, —2, . . . } and n(j) G [n(j a ) + 1, Tt{j* R ) + A] 
do 

is <- i 

end while 

while there exists some j which appears before i in the sequence {0, 1, — 1, 2, —2, . . . } and 7r(j) £ [7r(i£) — A, 7r(j£) — 1] 
do 

it <- i 

end while 



Output j* and j£. 



C. Sfe/? 2 

Next we prove that we only need to consider symmetric probability mass functions which are monotonically decreasing 
when i > 0. 
Define 

SV, ym ^{Ve SV moao \ Vii) = P(-t),Vt e Z}. (343) 

Lemma 28. 

V* = inf V C(i)V(i). (344) 
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Proof: The proof is essentially the same as the proof of Lemma [13] 
Given V a G SV moml , define a new probability mass function Vb with 

^A ^lMi ^igZ, (345) 

It is easy to see Vb is a valid probability mass function and symmetric. Since the cost function £(•) is symmetric, 

]T £(i)P„(») = ^ C(i)V b (i). (346) 

i— — oo z— — oo 

Next we show that Vb also satisfies the differential privacy constraint (62) . For any i G Z and |d| < A, since "P a (*) < 
e e V a {i + d) and V a (—i) < e e V a (~i ~ d), we have 

P b « = Ml^H) 047) 
< e ^ a (i + d) + e^ a H-d) 

= e e n(i + rf). (349) 



Therefore, "Pf, satisfies ( |62) . 
Finally, for any < i < j, 

r b (i) = Mli^H) (3 5o) 

> m+^di (35D 

= V b {j). (352) 
So Vb G SV mono , and thus Vb G SV sym . We conclude 

F *=^ s E (353) 



Next we show that among all symmetric and monotonically decreasing (for i > 0) probability mass function, we only need 
to consider those which are periodically and geometrically decaying. 
More precisely, define 

SV pd ^{Ve SV iym \ v ^ = e e ,Vi GN}. (354) 

Then 

Lemma 29. 

V* = inf V(7>). (355) 

resv„ d 

Proof: Due to Lemma [28] we only need to consider probability mass functions which are symmetric and monotonically 
decreasing for i > 0. 

We first show that given V a G SV sym , if < e e , then we can construct a probability mass function "Pj, € SV sy m such 
that = e c and 

> V(V b ). (356) 

Since is symmetric, 

V(V a ) = C(0)V a (0) + 2 £ £(i)P„(i). (357) 

Suppose < e e , then define a new symmetric probability mass function Vb with 

n(0) = (1 + ^(0), (358) 
V b {i) = (1 - <*')?>«»(*), Vi g Z\{0}, (359) 



33 



where 



5' = > o (361) 

"Pa(O) -P„(0) 

so that = e e . 

It is easy to see Vb G oP sym , and 

^(P 6 ) - (362) 

=6£{Q)V a (Q) - 2,5' J2 (363) 

»=i 

<(5£(0)7> Q (0) - 2,5' £ £(0)P a (i) (364) 
»=i 

<(5£(0)P a (0) - <5'£(Q)(1 - V a (0)) (365) 

=0. (366) 

Therefore, we only need to consider V G S'Psym satisfying = e £ . 

By using the same argument as in the proof of Lemma 20 one can conclude that we only need to consider V G SV sym 
satisfying 

P ^ = e e ,Vi G N. (367) 
7>(i + A) 



Therefore, V* = infp e5 -p pd V(V). 



Proof of Theorem 10 In the case that A = 1, due to Lemma 29 the symmetry property and \361\ completely characterize 
the optimal noise probability mass function, which is the geometric mechanism. ■ 



E. Step 4 

Due to Lemma 29 the optimal probability mass function V is completely characterized by V(0),V(l), . . . , V(A — 1). Next 
we derive the properties of optimal probability mass function in the domain {0, 1, 2, . . . , A — 1}. 
Since Lemma 29 solves the case A = 1, in the remaining of this section, we assume A > 2. 
Define 

SV stepx ^{Ve SV pd \3k G {0, 1, . . . , A — 2}, V(i) = V(0), Vt G {0, 1, . . . , k}, V(j) = \V(0), Vj G {k + 1, k + 2, . . . , A - 1}}. 



Lemma 30. 



V* = inf VCP). 
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(368) 



(369) 



holds for 



Proof: If A = 2, then for any V e SV^a, we can set k — 0, and e ^'P ste p y(A-i) . Therefore, Lemma 
A = 2. 

Assume A > 3. First, we prove that we only need to consider probability mass function V G SV p d such that there exists 
ke {1,2,..., A- 2} with 



V(i)=V(0),Vi€{0,l,...,k-l} 
V(j)=V(A-l),Vie{k + l,k + 2,...,A-l}. 



(370) 
(371) 



More precisely, let V a G SVpd, we can construct a probability mass function Vb G SV v d such that there exists k satisfying 
3701) and (R7T), and V(V b ) > V(V a ). 



The proof technique is very similar to proof of Lemma 23 Suppose there does not exists such k for V a , then let fci be the 
smallest integer in {1,2,..., A — 1} such that 



V a (h)^V a (0), 



(372) 
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and let k 2 be the biggest integer in {0, 1, . . . , A — 2} such that 

T a (k 2 )^Ta(A-l). (373) 

It is easy to see that k\ < k 2 , and k\ ^ 0. Then we can increase V a {k\) and decrease V a (k 2 ) simultaneously by the same 
amount to derive a new probability mass function V b G SV V & with smaller cost. Indeed, if 

7>„(0) - V a (ki) < V a (k 2 ) - V a (A - 1), (374) 

then consider a probability mass function V b G SVpd with 

nW=^a(0),V0<i<fc l5 (375) 

Vb(i)=Va(i),Vk 1 <i<k 2 , (376) 

Vb{k 2 ) = V a {k 2 ) - (V a {0) - V a (h)), (377) 

V b (i)=V a (i),Vk 2 <i<A-l. (378) 

Define 



tu =£(0) + 2^£(fcA) e - fee , (379) 
fc=i 

oo 

lOi = 2^£(i + fcA)e^ fee ,Vi G {1, 2, . . . , A — 1}. (380) 

fc=0 

Note that since £(•) is a monotonically decreasing function when i > 0, we have iuo < Wi < ■ ■ ■ < wa-i- 
Then we can verify that V(Vb) < V(V a ) via 

V(V b )-V(V a ) (381) 

A-l A-l 

= ~ V S)^i (382) 

= (Va(0) - Pa{h)){w kl ~ w k2 ) (383) 

< 0. (384) 

If 

V a {0) - V a (h) > V a (k 2 ) - T a (A - 1), (385) 

then we can define Vb € SV ? d by setting 

r b (i)=V a (Q),yO<i<h, (386) 

Vb(ki) = V a (h) + (V a (k 2 ) - V a {A - 1)), (387) 

Tb(i)=V a {i),Vk 1 <t<k 2 , (388) 

V b {i) = V a (A-l),Vk 2 <i<A-l. (389) 

And similarly, we have 

V(V b ) - V{V a ) = (T a (k 2 ) - T a (A - l))(w kl - w k , ) < 0. (390) 

Therefore, continue in this way, and finally we will obtain a probability mass function Vb G SVpd such that there exists k 
to satisfy <[370| and ([371} and V(V b ) < V{V a ). 

From the above argument, we can see that in the optimal solution V* E SVpd, the probability mass function can only take 
at most three distinct values for all i G {0, 1, . . . , A — 1}, which are V*(0),V*(k) and V*(A — 1). Next we show that indeed 
either V*(k) = V*(0) and V*(k) =P*(A- 1), and this will complete the proof of Lemma [30] 

The optimal probability mass function V E SV v d can be specified by three parameters T-'(O), A G [e~ e , 1], k G {1, 2, . . . , A— 
2} and V(k). We will show that when k and A are fixed, to minimize the cost, we have either V(k) — V(0) or V{k) = 
V(A-l) = XV(0). 

Since ^,,0^(0 = 1. 

^(0) +PW + (A-fc-W0)_, (0) = 1| (391) 
and thus V{k) = (i+g(0))(i-fc)-2P(0) fc -2A P (0)(A- fe -i) _ 
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The cost for V is 

fe-l A-l 

V{V) = V{Q) y £ / w i +V{/±-l) m+V(k)w k (392) 

i=0 i=k+l 

- no) £ », + A P( o) £ + ( (' + P(o»(i-H-a'(0)*-w(0)(A-*-i) )iiiit (W3) 

i=0 i=k+l 

which is a linear function of the parameter 'P(O). 
Since V{k) > XP(0) and V(k) < V(0), we have 

2 kV(0)+V(k) + (A-k-l)\V(0) r{Q) ^ l<2 kV(0)+V(0) + (A-k-l)XV(Q) ^ 
2 fcg(0) + g(fc) + (A - k - l)AP(O) p(Q) = 1>2 kV(0) + A7?(0) + (A - k - l)XV(0) ^ 095) 

and thus the constraints on 'P(O) are 

2fc + 2 + 2A(A-/c-l)-l + 6 - V ^ ~ 2k + 2X(A-k)-l + b' (3%) 

Since V(V) is a linear function of V(0), to minimize the cost V(V), either T'(O) = 9fc + 2+2A(A^fc-i)-i+b or ^(0) = 
2fc+2A(A^ & fc)-i+& ' ^ e '' ^(0) should take one of the two extreme points of P96[ ). To get these two extreme points, we have 
either V(k) = V(0) or V(k) = XV(0) = V(A - 1). 

Therefore, in the optimal probability mass function V E SVpd, there exists k G {0, 1, . . . , A — 2} such that 

V(i)=V(0),Vie{0,l,...,k} (397) 
V{i) = V(A - 1), V* G {k + 1, k + 2, . . . , A - 1}. (398) 

This completes the proof of Lemma [30] 



F. Step 5 

In the last step, we prove that although A e [e _e , 1], in the optimal probability mass function, A is either e~ e or 1, and this 
will complete the proof of Theorem [TT] 

Proof: For fixed k G {0, 1, . . . , A - 2}, consider V € SV V& with 

P(i)=P(0),Vi€{0,l,...,fc}, (399) 
= AP(0), Vi € {fc + 1, fc + 2, . . . , A - 1}. (400) 



Since £t-ao7>(i) = l. 



and thus 



2 ( fc + l)P(0) + (A- fc -l)AP(0)_ 
l — o 



^(0) = ^ , T\ , o.a 1 (402) 



2(fc + 1) +2(A - k - 1)A - 1 + &' 



Hence, P is specified by only one parameter A. 
The cost of V is 



A-l 



V{V) = (403) 

i=0 

fe A-l 

= V(0)Y / ™* + W(0)Y / w * (404) 

i=0 fe+1 



2{k + 1) + 2(A - k - 1)A - 1 + 6 



(405) 



^ - 6 >«* + 2( fc + l) + 2(A C Vl)A-l + )' (4 ° 6) 
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where C\ and C_ are constant terms independent of A. Therefore, to minimize V(V) over A € [e~ e ,l], 
extreme points, either e _e or 1, depending on whether C_ is negative or positive. 

When A = 1, then the probability mass function is uniquely determined, which is V € SVpd with 

P(0 = ^^,V< e {0,l > ...,A-l} I 

which is exactly V r defined in ( |6*6*] i with r = A. 

When A = e~ e , the probability mass function is exactly V r with r = k + 1. 
Therefore, we conclude that 

+00 

U* = min V C(i)V r (i). 

{reN|l<r<A} jf— ' 
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